Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize on a single fast hasher, i.e. rustc-hash #2280

Closed
wants to merge 1 commit into from

Conversation

adamreichold
Copy link
Collaborator

We currently use both fnv and rustc-hash which basically fulfill the same role of very fast but low quality hashes. Since rustc-hash was originally created as a more modern replacement for fnv, let's standardize on that one.

We currently use both fnv and rustc-hash which basically fulfill the same role
of very fast but low quality hashes. Since rustc-hash was originally created as
a more modern replacement for fnv, let's standardize on that one.
@PSeitz
Copy link
Contributor

PSeitz commented Jan 18, 2024

I don't think FxHashMap is just a better version of FnvHashMap. fx hash works well for numbers and fnv hash for short strings in my experience

@adamreichold
Copy link
Collaborator Author

I don't think FxHashMap is just a better version of FnvHashMap. fx hash works well for numbers and fnv hash for short strings in my experience

Yes, is not just a better version. I do think the main aim when designing fxhash/rustc-hash was to be a modern FNV replacement though, i.e. a really fast but low quality hash, and their main structural difference AFAIK is that fxhash works at the word level whereas FNV works at the byte level.

Looking at both sources, it seems fxhash's Hasher::write implementation is much more complicated because it tries hard to process usize-sized chunks of the incoming at lot work which will probably be wasted if the input is smaller than e.g. 8 bytes. I will into this and try improve things for short strings and report back here if whether was successful or not.

@adamreichold
Copy link
Collaborator Author

@PSeitz I started by trying to isolate the difference between fxhash and FNV using a benchmark at https://github.com/adamreichold/rustc-hash/tree/small-strings but I get a lead for FNV only for very short strings (3 bytes):

test long_strings_fnv   ... bench:      38,805 ns/iter (+/- 356)
test long_strings_fx    ... bench:      12,169 ns/iter (+/- 84)
test medium_strings_fnv ... bench:      13,653 ns/iter (+/- 127)
test medium_strings_fx  ... bench:       9,671 ns/iter (+/- 46)
test small_strings_fnv  ... bench:       9,778 ns/iter (+/- 106)
test small_strings_fx   ... bench:       8,600 ns/iter (+/- 86)
test tiny_strings_fnv   ... bench:       7,283 ns/iter (+/- 58)
test tiny_strings_fx    ... bench:       8,115 ns/iter (+/- 95)

And for long strings (47 bytes), fxhash's becomes quite significant. Do you think this is the signal you were referring to or would you recommend different parameters for the benchmark (1000 HashSet entries is probably not very much)?

@adamreichold
Copy link
Collaborator Author

(Note that in the above, I tried to pinpoint sizes where fxhash should do worse than FNV, e.g. 3 bytes means two rounds for fxhash whereas both 2 and 4 would do with a single round. Similarly, 7 bytes implies 3 rounds whereas 6 does with 2 and 8 with 1, but already here fxhash seems to pull ahead.)

@PSeitz
Copy link
Contributor

PSeitz commented Jan 18, 2024

Measuring hashes is pretty hard, as you'd need to account for performance and hash collisions on relevant datasets. hash collisions also depend on the hashmap size.
strings are typically not all the same length, you may circumvent effects from the branch predictor.

The performance should be measured for insertion, as it's relevant how the code performs in context of a Hashmap insert.

@adamreichold
Copy link
Collaborator Author

Measuring hashes is pretty hard, as you'd need to account for performance and hash collisions on relevant datasets. hash collisions also depend on the hashmap size.

Certainly, but the microbenchmark does seem to capture the signal you were referring to so I guess it is not totally off the mark. I can certainly vary the hash table size to see if this makes a difference.

strings are typically not all the same length, you may circumvent effects from the branch predictor.

I will also augment the microbenchmark to use strings up to the given length instead of using a fixed length.

The performance should be measured for insertion, as it's relevant how the code performs in context of a Hashmap insert.

Shouldn't looking up all keys in the hash table already be a good proxy for the amount collisions. I do care about collisions which is why I did not just test the bare hashing. But insertion always has the problem of removing the allocation part from the measurement loop.

FWIW, I made a simple change which closes the gap even for tiny strings:

test long_strings_fnv   ... bench:      37,515 ns/iter (+/- 50)
test long_strings_fx    ... bench:      12,152 ns/iter (+/- 75)
test medium_strings_fnv ... bench:      13,651 ns/iter (+/- 34)
test medium_strings_fx  ... bench:      10,352 ns/iter (+/- 49)
test small_strings_fnv  ... bench:       9,703 ns/iter (+/- 36)
test small_strings_fx   ... bench:       9,349 ns/iter (+/- 25)
test tiny_strings_fnv   ... bench:       7,157 ns/iter (+/- 23)
test tiny_strings_fx    ... bench:       7,341 ns/iter (+/- 51)

at the cost of breaking value stability which we probably do not care about at all but I will have to ask how upstream feels about that.

And finally using a bit of unsafe code as a polyfill for the still unstable slice::as_chunks, the modified fxhash also pulls ahead reliably even for tiny strings as it generally goes faster:

test long_strings_fnv   ... bench:      37,581 ns/iter (+/- 137)
test long_strings_fx    ... bench:      11,183 ns/iter (+/- 16)
test medium_strings_fnv ... bench:      13,624 ns/iter (+/- 22)
test medium_strings_fx  ... bench:       9,137 ns/iter (+/- 61)
test small_strings_fnv  ... bench:       9,736 ns/iter (+/- 26)
test small_strings_fx   ... bench:       8,170 ns/iter (+/- 50)
test tiny_strings_fnv   ... bench:       7,249 ns/iter (+/- 25)
test tiny_strings_fx    ... bench:       6,861 ns/iter (+/- 32)

@adamreichold
Copy link
Collaborator Author

I refined the benchmarks, add for the current fxhash this adds three situations where FNV beats it (few_tiny as before and some_small but that only is really tiny):

test few_large::fnv   ... bench:      19,621 ns/iter (+/- 120)
test few_large::fx    ... bench:       9,355 ns/iter (+/- 166)
test few_medium::fnv  ... bench:       9,356 ns/iter (+/- 98)
test few_medium::fx   ... bench:       7,898 ns/iter (+/- 115)
test few_small::fnv   ... bench:       7,300 ns/iter (+/- 60)
test few_small::fx    ... bench:       7,337 ns/iter (+/- 109)
test few_tiny::fnv    ... bench:       6,936 ns/iter (+/- 50)
test few_tiny::fx     ... bench:       7,513 ns/iter (+/- 84)
test many_large::fnv  ... bench:   3,883,669 ns/iter (+/- 26,968)
test many_large::fx   ... bench:   2,233,341 ns/iter (+/- 14,412)
test many_medium::fnv ... bench:   2,071,061 ns/iter (+/- 14,305)
test many_medium::fx  ... bench:   1,677,142 ns/iter (+/- 10,292)
test many_small::fnv  ... bench:   1,471,003 ns/iter (+/- 25,075)
test many_small::fx   ... bench:   1,336,677 ns/iter (+/- 8,678)
test some_large::fnv  ... bench:     309,672 ns/iter (+/- 22,118)
test some_large::fx   ... bench:     196,661 ns/iter (+/- 4,512)
test some_medium::fnv ... bench:     158,861 ns/iter (+/- 2,246)
test some_medium::fx  ... bench:     151,262 ns/iter (+/- 3,039)
test some_small::fnv  ... bench:     119,939 ns/iter (+/- 3,410)
test some_small::fx   ... bench:     120,153 ns/iter (+/- 749)

After the value breaking change, one is left where fxhash actually regressed (many_small):

test few_large::fnv   ... bench:      19,846 ns/iter (+/- 174)
test few_large::fx    ... bench:       8,206 ns/iter (+/- 70)
test few_medium::fnv  ... bench:       9,318 ns/iter (+/- 94)
test few_medium::fx   ... bench:       6,917 ns/iter (+/- 71)
test few_small::fnv   ... bench:       7,288 ns/iter (+/- 69)
test few_small::fx    ... bench:       6,679 ns/iter (+/- 69)
test few_tiny::fnv    ... bench:       6,884 ns/iter (+/- 82)
test few_tiny::fx     ... bench:       6,421 ns/iter (+/- 72)
test many_large::fnv  ... bench:   3,838,407 ns/iter (+/- 67,395)
test many_large::fx   ... bench:   1,912,792 ns/iter (+/- 21,136)
test many_medium::fnv ... bench:   2,079,771 ns/iter (+/- 16,490)
test many_medium::fx  ... bench:   1,580,259 ns/iter (+/- 19,684)
test many_small::fnv  ... bench:   1,482,612 ns/iter (+/- 13,102)
test many_small::fx   ... bench:   1,634,143 ns/iter (+/- 15,159)
test some_large::fnv  ... bench:     317,275 ns/iter (+/- 20,146)
test some_large::fx   ... bench:     169,852 ns/iter (+/- 1,669)
test some_medium::fnv ... bench:     158,647 ns/iter (+/- 4,668)
test some_medium::fx  ... bench:     135,808 ns/iter (+/- 556)
test some_small::fnv  ... bench:     120,479 ns/iter (+/- 1,543)
test some_small::fx   ... bench:     115,738 ns/iter (+/- 1,121)

I think with these results, a final decision should be based on benchmarking Tantity with the above-linked changes. But in any case, I will first propose this upstream and ask what they think as we surely do not want to maintain a bespoke fxhash variant.

@fulmicoton
Copy link
Collaborator

@adamreichold do you have a tantivy or a quickwit benchamrk that shows an improvement?

@adamreichold
Copy link
Collaborator Author

I am currently going for the gold standard in hash table benchmarks, i.e. rustc. ;-) rust-lang/rust#120093

If their benchmark infrastructure says this is interesting, then I'll check try to get it merged upstream and then I'll come back here. If the rustc benchmarks indicate this is pointless, I will close this here as well.

@adamreichold
Copy link
Collaborator Author

(Just to clarify, I am not trying to say that this will yield some product-level performance change or anything. I basically came here purely for long-term small-scale maintenance because I thought "depending on one fast hash crate is better than on two". But if fxhash's problems with short strings block that and can be resolved to make that crate more widely useful, than that would be great IMHO.)

@adamreichold
Copy link
Collaborator Author

rustc's benchmarks indicated that merging the final rounds of fxhash to improve its speed for really short strings does also decrease it quality so that there are a few small regressions in rustc's symbol interning. So I guess we will continue to mix-and-match using both fxhash and FNV. (One other relevant candidate I see is AHash, but AFAIK it is generally considered to be more heavy weight than both fxhash/AHash even though of higher quality. I guess checking whether it provides a net gain on machine with AES hardware instructions might be worth looking into.)

@adamreichold adamreichold deleted the single-fast-hasher branch January 19, 2024 05:18
@PSeitz
Copy link
Contributor

PSeitz commented Jan 19, 2024

Yes ahash is one of the examples where a fast hash algorithm on its own does not perform as well in context of a hashmap insertion.
I don't know if it's due to pressure on the CPU slots or higher cache instruction pressure, but I guess one of both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants